sdk(llm): Stop model-name whack-a-mole: revert to core family substring matching #879

enyst · 2025-10-23T18:22:37Z

@xingyaoww I'll fix the conflicts etc, but I'd love your attention for a bit here: this PR simplifies things in agent-sdk right now, because we actually lost the recognition of the "claude-x" in model names, when the provider had a more complex name than just "anthropic/". In other words, we lost bedrock match. This fixes it.

The cause for that is that we changed at some point in the recent past, from simple match of the core name ("claude-3-5") anywhere in the full provider/model (which is accurate! No Llama will call itself "claude-3-5" 😅), to some code using globbing normalization which forced us to add half a million patterns to account for the variety of forms in which "claude-3-5" is included out there - and it keeps missing some, of course.

I looked into it and I believe that was a mistake I made. This is a revert.

This fixes some reports on slack on Bedrock. I'd love to merge this, to stop the whack-a-mole, rather than hardcoding more stuff in these patterns. I think maybe we can also take them out, but I would still love this revert first.

Summary

Return to simple core family substring matching across the full raw model string
Remove fnmatch/globbing and stop using normalization for feature detection
Update pattern tables to pure substrings (no wildcards)
Add tests to validate e.g. Bedrock-style names (the most messy)

What & Why

This PR restores the durable invariant:

if a meaningful family token (e.g., 'claude-3-5-sonnet', 'gpt-4o', 'o3', 'gemini-2.5-pro') appears anywhere in the model string
=> the feature applies.

"bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0" => "claude-3-5-sonnet" applies.

This eliminates the pattern maintenance whack-a-mole caused by dotted prefixes and provider-specific suffixes and aligns again with proven behavior in the wild.

Recent refactor introduced fnmatch-based globbing over a normalized basename. This unintentionally diverged from the prior V0 behavior where we effectively matched by substring on the full provider/model name. That change broke real-world cases, notably with AWS Bedrock where names embed dotted vendor prefixes and version suffixes inside the basename (e.g., bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0). Our patterns like 'claude-3-5-sonnet*' stopped matching after normalization and globbing.

Implementation Details

model_matches()
- Lowercase + strip the incoming model string and perform case-insensitive substring checks on the full raw string
- For each pattern, lowercase/strip and drop any trailing '*' (migration aid); treat the remaining token as a plain substring
- Return True on first match; False otherwise
- No use of normalize_model_name() here
Pattern tables: remove '*'
- FUNCTION_CALLING_PATTERNS, REASONING_EFFORT_PATTERNS, PROMPT_CACHE_PATTERNS, SUPPORTS_STOP_WORDS_FALSE_PATTERNS, RESPONSES_API_PATTERNS now contain pure substrings
- Provider-qualified entries remain supported by virtue of substring matching against the raw string
normalize_model_name()
- Not used by matching. Tests exercising normalization for matching were removed to avoid confusion
Tests
- Remove wildcard expectations; adapt to pure substring semantics
- Ensure Bedrock coverage: e.g., 'bedrock/anthropic.claude-3-5-sonnet-20241022-v2:0' enables function calling and prompt cache
- Verify provider-qualified substrings gate as expected (e.g., 'openai/gpt-4o' matches 'openai/gpt-4o' but not 'anthropic/*')
- Keep conservative defaults for unknown models

Outcomes

Clear behavior: if the essential family token appears in the model string, the feature applies
Fewer special-case patterns and more durable matching across providers
Restores pre-refactor semantics that worked reliably in practice

Checklist

Code formatted and linted via pre-commit
Updated tests for sdk changes; all impacted sdk tests pass locally

Closes #844

@enyst can click here to continue refining the PR

Agent Server images for this PR

• GHCR package: https://github.com/OpenHands/agent-sdk/pkgs/container/agent-server

Variants & Base Images

Variant	Base Image	Docs / Tags
golang	`golang:1.21-bookworm`	Link
java	`eclipse-temurin:17-jdk`	Link
python	`nikolaik/python-nodejs:python3.12-nodejs22`	Link

Pull (multi-arch manifest)

docker pull ghcr.io/openhands/agent-server:45b3ece-python

Run

docker run -it --rm \
  -p 8000:8000 \
  --name agent-server-45b3ece-python \
  ghcr.io/openhands/agent-server:45b3ece-python

All tags pushed for this build

ghcr.io/openhands/agent-server:45b3ece-golang
ghcr.io/openhands/agent-server:v1.0.0a5_golang_tag_1.21-bookworm_binary
ghcr.io/openhands/agent-server:45b3ece-java
ghcr.io/openhands/agent-server:v1.0.0a5_eclipse-temurin_tag_17-jdk_binary
ghcr.io/openhands/agent-server:45b3ece-python
ghcr.io/openhands/agent-server:v1.0.0a5_nikolaik_s_python-nodejs_tag_python3.12-nodejs22_binary

The 45b3ece tag is a multi-arch manifest (amd64/arm64); your client pulls the right arch automatically.

Cross-repo impact: Fix: OpenHands/OpenHands#11248

…normalize usage - model_matches now does case-insensitive substring on full raw model - strip trailing '*' in patterns (migration aid) - pattern tables converted to plain substrings (no '*') - drop normalize_model_name and related tests - update tests to reflect substring semantics and Bedrock coverage Fixes #844 Co-authored-by: openhands <[email protected]>

github-actions · 2025-10-23T18:26:14Z

Coverage Report •

File	Stmts	Miss	Cover	Missing
TOTAL	11093	4988	55%

report-only-changed-files is enabled. No files were changed during this commit :)

…handling and empty-token skipping - Patterns are now used exactly as provided (lowercased/stripped) - No special handling for '*' or empty tokens Co-authored-by: openhands <[email protected]>

…eature detection - Validate provider-prefixed Bedrock ids and plain vendor-prefixed names - Ensure function-calling and prompt-cache features are enabled for Claude families Co-authored-by: openhands <[email protected]>

…edrock dotted vendor prefixes - Function-calling: adds claude-sonnet-4-5 and claude-sonnet-4.5, and us.anthropic.* examples - Prompt cache: keep only supported families; drop unsupported haiku-4.5 dotted vendor case Co-authored-by: openhands <[email protected]>

… extend tests with dotted vendor forms - Add claude-haiku-4.5 and claude-haiku-4-5 to PROMPT_CACHE_PATTERNS - Expand tests for us.anthropic.* and local names for Haiku 4.5 Co-authored-by: openhands <[email protected]>

github-actions · 2025-10-23T22:31:10Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-10-23T22:31:10Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-10-23T22:31:12Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

blacksmith-sh · 2025-11-01T12:51:39Z

[Automatic Post]: It has been a while since there was any activity on this PR. @enyst, are you still working on it? If so, please go ahead, if not then please request review, close it, or request that someone else follow up.

enyst · 2025-11-01T13:57:41Z

@OpenHands Merge main into this PR and fix the conflicts.

openhands-ai · 2025-11-01T13:57:48Z

Uh oh! There was an unexpected error starting the job :(

github-actions · 2025-11-01T13:58:42Z

Hi! I started running the integration tests on your PR. You will receive a comment with the results shortly.

github-actions · 2025-11-01T14:03:26Z

🧪 Integration Tests Results

Overall Success Rate: 100.0%
Total Cost: $0.84
Models Tested: 3
Timestamp: 2025-11-01 14:03:24 UTC

📁 Detailed Logs & Artifacts

Click the links below to access detailed agent/LLM logs showing the complete reasoning process for each model. On the GitHub Actions page, scroll down to the 'Artifacts' section to download the logs.

litellm_proxy_claude_sonnet_4_5_20250929: 📥 View & Download Logs
litellm_proxy_gpt_5_mini_2025_08_07: 📥 View & Download Logs
litellm_proxy_deepseek_deepseek_chat: 📥 View & Download Logs

📊 Summary

Model	Success Rate	Tests Passed	Total Tests	Cost
litellm_proxy_claude_sonnet_4_5_20250929	100.0%	7/7	7	$0.73
litellm_proxy_gpt_5_mini_2025_08_07	100.0%	7/7	7	$0.04
litellm_proxy_deepseek_deepseek_chat	100.0%	7/7	7	$0.07

📋 Detailed Results

litellm_proxy_claude_sonnet_4_5_20250929

Success Rate: 100.0% (7/7)
Total Cost: $0.73
Run Suffix: litellm_proxy_claude_sonnet_4_5_20250929_602f6bd_sonnet_run_N7_20251101_135909

litellm_proxy_gpt_5_mini_2025_08_07

Success Rate: 100.0% (7/7)
Total Cost: $0.04
Run Suffix: litellm_proxy_gpt_5_mini_2025_08_07_602f6bd_gpt5_mini_run_N7_20251101_135914

litellm_proxy_deepseek_deepseek_chat

Success Rate: 100.0% (7/7)
Total Cost: $0.07
Run Suffix: litellm_proxy_deepseek_deepseek_chat_602f6bd_deepseek_run_N7_20251101_135910

xingyaoww

lgtm! thank you

This updates the direct completion access to use resp.message and extract TextContent, aligning with the current LLMResponse interface. Co-authored-by: openhands <[email protected]>

…ring matching semantics and add function_calling support flag - Keep substring-based model_matches per PR #879 direction - Restore function calling patterns and features - Align tests to substring semantics and function_calling expectations Co-authored-by: openhands <[email protected]>

…ing semantics - Remove supports_function_calling field and patterns - Remove related tests - Keep substring-based model_matches; keep other feature flags intact Co-authored-by: openhands <[email protected]>

Co-authored-by: openhands <[email protected]>

…mini-latest' and gpt-5 family Co-authored-by: openhands <[email protected]>

openhands-ai · 2025-11-01T23:28:01Z

Looks like there are a few issues preventing this PR from being merged!

GitHub Actions are failing:
- Agent Server

If you'd like me to help, just leave a comment, like

@OpenHands please fix the failing actions on PR #879 at branch `openhands/revert-to-substring-matching`

Feel free to include any additional details that might help me get this PR into a better state.

_{^{You can manage your notification settings}}

openhands-ai bot mentioned this pull request Oct 23, 2025

Stop the model-name whack-a-mole: revert to core family substring matching (remove globs, drop normalize) #844

Closed

enyst marked this pull request as draft October 23, 2025 18:51

enyst and others added 4 commits October 23, 2025 18:53

sdk(llm): simplify substring matching further by removing trailing-* …

c3080f8

…handling and empty-token skipping - Patterns are now used exactly as provided (lowercased/stripped) - No special handling for '*' or empty tokens Co-authored-by: openhands <[email protected]>

enyst added the integration-test Runs the integration tests and comments the results label Oct 23, 2025

This comment was marked as outdated.

Sign in to view

Merge branch 'main' into openhands/revert-to-substring-matching

602f6bd

enyst added integration-test Runs the integration tests and comments the results and removed integration-test Runs the integration tests and comments the results labels Nov 1, 2025

enyst marked this pull request as ready for review November 1, 2025 16:42

enyst requested a review from xingyaoww November 1, 2025 16:42

xingyaoww approved these changes Nov 1, 2025

View reviewed changes

enyst and others added 5 commits November 1, 2025 22:25

examples: access LLMResponse.message in registry example

a0f9cc4

This updates the direct completion access to use resp.message and extract TextContent, aligning with the current LLMResponse interface. Co-authored-by: openhands <[email protected]>

cleanup: drop supports_function_calling feature per main, keep substr…

8b83501

…ing semantics - Remove supports_function_calling field and patterns - Remove related tests - Keep substring-based model_matches; keep other feature flags intact Co-authored-by: openhands <[email protected]>

llm(utils): add 'codex-mini-latest' to RESPONSES_API_PATTERNS per main

dd83f40

Co-authored-by: openhands <[email protected]>

tests(llm): add coverage for supports_responses_api including 'codex-…

379e72d

…mini-latest' and gpt-5 family Co-authored-by: openhands <[email protected]>

Merge branch 'main' into openhands/revert-to-substring-matching

737d2f7

enyst enabled auto-merge (squash) November 2, 2025 10:34

enyst merged commit 21c4d27 into main Nov 2, 2025
13 checks passed

enyst deleted the openhands/revert-to-substring-matching branch November 2, 2025 10:34

sdk(llm): Stop model-name whack-a-mole: revert to core family substring matching #879

sdk(llm): Stop model-name whack-a-mole: revert to core family substring matching #879

Uh oh!

Conversation

enyst commented Oct 23, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

What & Why

Implementation Details

Outcomes

Uh oh!

github-actions bot commented Oct 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

github-actions bot commented Oct 23, 2025

Uh oh!

This comment was marked as outdated.

blacksmith-sh bot commented Nov 1, 2025

Uh oh!

enyst commented Nov 1, 2025

Uh oh!

openhands-ai bot commented Nov 1, 2025

Uh oh!

github-actions bot commented Nov 1, 2025

Uh oh!

github-actions bot commented Nov 1, 2025

🧪 Integration Tests Results

📁 Detailed Logs & Artifacts

📊 Summary

📋 Detailed Results

litellm_proxy_claude_sonnet_4_5_20250929

litellm_proxy_gpt_5_mini_2025_08_07

litellm_proxy_deepseek_deepseek_chat

Uh oh!

xingyaoww left a comment

Choose a reason for hiding this comment

Uh oh!

openhands-ai bot commented Nov 1, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

enyst commented Oct 23, 2025 •

edited by github-actions bot

Loading

github-actions bot commented Oct 23, 2025 •

edited

Loading